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REFERENCES 


1 .  INTRODUCTION 


Many  of  the  problems  in  decision  and  control  theory  involve  estima¬ 
tion  and  optimization.  Several  methods  including  self-tuning  regulators 
and  model  reference  adaptive  schemes  are  available  for  estimating  and 
controlling  systems  with  unknown  parameters  [1],[2],[3].  The  theory  for 
the  optimization  of  a  single  performance  index  for  both  deterministic  and 
stochastic  systems  with  known  parameters  is  well-established  [4] ,[5], [6], 

The  theory  for  the  identification  and  control  of  systems  with  several 
decision  makers,  each  having  different  information  available  and  each  having 
his  own  performance  index,  is difficult.  \  7] .  There  are  several 
conceptual  reasons  why  the  classical  theories  for  systems  with  single  cost 
functions  cannot  be  easily  adjusted  to  handle  multiple  cost  functions.  First 
it  may  not  be  possible  to  optimize  the  multiple  objectives  simultaneously. 
Second,  the  information  available  to  each  user  is  not  necessarily  the  same. 
These  problems  do  not  occur  for  the  single  objective  case. 


1.1.  Overview  of  Multi-User  Control  Theory 

The  problem  of  optimizing  multiple  objective  functions  has  led  to 
the  development  of  several  solution  concepts,  [8].  A  Pareto-optimal  solution 
is  used  when  there  is  cooperation  among  the  decision  makers.  For  systems 
in  which  cooperation  cannot  be  guaranteed,  a  Nash  solution  is  employed  .  [9] . 
Some  systems  have  a  structure  in  which  one  user  is  able  to  enforce  his 
strategy  upon  another  user.  A  solution  concept  for  this  type  of  system  is 
known  as  a  leader-follower  or  Stackelberg  solution. 


The  Nash  decision  strategy  arises  frequently  in  systems  with 


multiple  decision  makers.  An  inherent  property  of  the  Nash  strategy  is 
that  it  pravents  decision  makers  from  cheating.  Any  unilateral  deviation 
by  a  decision  maker  from  the  Nash  equilibrium  incurs  a  greater  cost  for 
that  decision  maker.  It  is  clear  that  the  Nash  strategy  is  a  rational 
strategy  for  systems  whose  users  do  not  cooperate. 

The  Nash  solution  concept  arises  often  in  economic  contexts. 
Consider  firms  competing  against  each  other  in  a  market.  Each  firm  seeks 
a  production  level  for  optimizing  its  cost  function:  profit.  The  firms  do 
not  cooperate  in  determining  production  levels.  A  Nash  strategy  may  also  be 
required  for  many  estimation  and  control  problems.  In  an  estimation  and 
control  scheme  there  may  be  one  performance  index,  e.g.,  minimum  mean  square, 
for  estimating  the  parameters  and  a  different  index,  e.g.,  quadratic,  for 
controlling  the  system.  The  goals  of  these  performance  indices  may  oppose 
each  other,  and  therefore,  a  Nash  solution  is  required.  A  Nash  game  can 
even  arise  in  a  leader-follower  setting.  Consider  a  hierarchical  structure 
in  which  there  are  several  followers  at  the  same  level  in  the  structure. 

The  leader  imposes  his  strategy  but  the  followers  are  permitted  to  compete 
with  each  other.  In  this  case,  the  followers  are  involved  in  both  leader- 
follower  and  Nash  games. 


1.2.  Determining  Nash  Strategies  Under  Uncertainties 

When  the  system  and  cost  functions  are  known  to  the  decision  makers, 
a  Nash  solution  can  be  found.  An  explicit  closed-form  expression  for  a 


Nash  equilibrium  exists  for  linear  systems  with  quadratic  performance  indices 
[10].  A  decision  maker  having  information  about  the  plant  and  the  others' 
objectives  can  determine  both  his  and  the  other  players'  Nash  strategies. 
However,  when  either  the  plant  is  not  known  or  the  cost  functions  are  not 
known  to  each  decision  maker,  a  player  cannot  determine  a  priori  the  Nash 
equilibrium.  This  work  investigates  how  a  decision  maker  can  use  reaction 
relations  of  the  other  decision  makers  for  determining  Nash  equilibrium. 


1.3.  Organization  of  Thesis 

In  Section  2  a  linear  quadratic  game  is  posed  and  an  equilibrium 
is  proposed.  In  Section  3  it  is  shown  that  the  proposed  equilibrium  is 
equivalent  to  a  Nash  equilibrium.  It  is  proven  in  Section  4  that  algorithms 
which  are  updated  based  upon  the  error  in  the  estimated  state  cannot  converge 
to  a  value  different  than  the  Nash  equilibrium.  In  Section  5  an  algorithm 


using  reaction  relations  of  the  other  decision  makers  is  described.  Finally, 
an  example  using  the  algorithm  is  given  in  Section  6.  ~ ■  ■ 


2.  A  PROPOSED  EQUILIBRIUM  FOR  A  LINEAR  QUADRATIC  GAME 


In  this  section  a  linear  quadratic  game  is  introduced  and  its 
certainty-equivalent  optimal  inputs  are  determined.  An  equilibrium  for  the 
game  is  proposed  and  the  defined  equilibrium  is  explicitly  calculated. 


2.1.  Formulation  of  the  Linear  Quadratic  Game 

Consider  a  linear  time -in variant,  discrete  system  described  by 

*k+l  "  ^  +  Bl\  +  B2\  +  Wk  (2a) 

k  k 

where  X,  is  the  n-dimensional  state  vector  at  time  k,  and  U  and  U  are 
k  Lk  2k 

m^  and  m^  dimensional  input  vectors  to  be  chosen  bv  Decision-Maker  1  (DM1) 

and  Decision-Maker  2  (DM2)  at  time  k,  respec t i ve 1 v .  Assume  that  (A,B^)  and 

(A,B„)  are  controllable.  Also  assume  that  V.',  i.»  an  n-dimensional  Gaussian 

2  x 

random  vector  with  E{W.  }  =  0  and  E{W,  W,'  •  >»P,  the  (n-n)  covariance  matrix. 

k  k  k 

The  single-stage  cost  function  associated  with  DMi  (i=  1,2)  at  time  k  is 

\  -  +  u;kRiui^  <2-2> 

where  is  an  (nuxnu)  positive  definite  matrix,  Q^  is  an  (nxn)  positive  semi- 
definite  matrix,  and  C^  is  an  n-dimensional  vector.  The  state  X^  is  available 
to  each  DM  at  time  k.  The  plant  (2.1)  is  known  to  each  DM.  Each  DM  knows 
his  cost  function  parameters,  Q,  R,  and  C,  but  he  does  not  know  the  other 
DM's.  It  is  assumed  that  each  DM  plays  rationally,  that  is,  he  chooses  his 
input  U  to  minimize  his  cost  (2.2). 
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As  stated  earlier,  each  DM  is  attempting  to  tune  his  control  law  to 
the  reaction  relations  of  the  other  DM.  Since  cooperation  between  the 
DM's  cannot  be  enforced,  it  is  desirable  that  each  DM. tune  his  control  to 
reach  a  Nash  equilibrium.  The  Nash  solution  to  the  N-stage  linear  quadratic 
game  is  given  in  [10];  however,  this  solution  requires  the  DM's  to  know  each 
other's  cost  function.  It  is  possible  that  estimates  of  the  cost  parameters 
could  be  used  in  a  dynamic  programming  solution  to  the  N-stage  problem,  but 
the  implementation  of  the  estimation  schemes  may  involve  calculating  condi¬ 
tional  probability  distributions,  which  can  be  difficult.  On  the  contrary, 
the  calculations  involved  in  minimizing  the  single  stage  cost  function  are 
quite  simple.  If  the  DM's  play  the  single  stage  game  over  and  over  while 
updating  their  control  laws  appropriately  at  each  stage,  and  their  control 
laws  converge  to  the  Nash  solution,  then  the  goals  of  the  DM's  have  been  met 
with  relatively  simple  calculations. 


2.2.  The  Certainty  Equilvalent  Control 

Due  to  the  quadratic  nature  of  the  cost  functions,  linear  controls 
are  assumed: 


U 


1, 

K. 


(2.3) 


(2.4) 


where  F.  (i=l,2)  is  an  (m/n)  matrix  and  G.  is  an  m.  -dimensional  vector. 

\  1  Lk  1 

Each  DM  estimates  the  other's  input  and  then  formulates  his  own  input  so 


as  to  minimize  his  cost  based  on  his  estimate.  The  principle  of  certainty 


equivalence  is  invoked  here:  The  DM's  replace  the  noise  by  its  mean  and 
estimate  each  other's  inputs  and  play  optimally  for  those  estimates.  The 
game  proceeds  to  the  next  stage  with  the  DM's  repeating  the  procedure 
above.  When  the  DM's  estimate  each  other's  input  correctly,  we  say  they 
have  reached  an  equilibrium.  It  will  be  shown  that  this  equilibrium  is  a 
unique  Nash  equilibrium. 

With  assumption  (2.4),  DM1  views  the  system  (2.1)  as 

X.  -  (A  +  B,F.  )Z.+  B,G,  +  B,U.  (2. 

Xk+1  k  k  k 

where  X.  is  DM1 's  estimate  of  the  next  state  based  on  his  estimate  of 
x+l 

DM2's  input 


(2. 


The  symbol  ’ ~ ’  indicates  an  estimated  value.  DM1  sees  his  cost  as 


J  -  1(A  +  B2F  ^B^  +B,U  -CjJ'QjKA  +  B,*  + 

kv  kv  kC  tv  tv  tv 


+  BlU  -C  ]+U  R,U 

k  k  k 


(2. 


Minimization  of  (2.7)  with  respect  to  U  yields 

Lk 

UL  =  -(BjQ1B1  +R1)“1b|01[  (A  +  B2F2  )Xk  +  B2G2  -Cx ]  .  (2. 

k  k  k 

The  positive  definiteness  assumptions  on  0^  and  guarantee  the  existence 
of  the  term  involving  the  inverse  in  (2.8). 

It  is  seen  that  DM1 's  input  is  a  function  of  his  estimate  of  DM2' 


input.  DM1 's  input  can  be  decomposed  as  follows 


W  =  ~ (B  1Q 1B 1  +  Rl}  BiQi(a  +  B2F2  ) 

K  K  K 


(2.9) 


(2.10) 


Note  that  DM2's  cost  parameters  R2»  an<^  <-'2’  are  un^nown 

to  DM1,  do  not  enter  into  (2.9)  or  (2,10).  In  forming  his  input,  DM1  knows 

A,  B^,  B2>  0^,  Rj,  and  C^.  Once  he  has  obtained  his  estimates  of  and 

k 

G  ,  his  optimal  input  is  easily  obtained.  Also  note  that  when  the  input 
2k 

is  decomposed,  F  (F  )  does  not  depend  on  G  (G„  )  and  vice  versa.  Similar 
lk  k  k  2k 

results  for  U  are  also  obtained: 

2k 


P2  (F  )  =  -(b’Q2B2+R2)"  B;Q2(A  +  BiF  ) 
k  k  k 

G  <G  )  -  -CB-Q2b2+R2>-1B^Q2<B101  -C2). 

iC  K.  K. 


(2.11) 


(2.12) 


2.3.  A  Proposed  Equilibrium 

Let  the  equilibrium  be  defined  as  when  each  DM's  estimate  of  the 
other's  input  is  correct.  At  equilibrium  we  then  have 


F  (F  )  =  F  =  F 
1  vr2  ’  1  1 
k  k  k  e 


F„  (F  )  =  F,  =  F? 
“k  lk  "k  "e 


G  (G  )  -  G  -  C 
k  k  k  e 


G0  (G  )  =  G?  =  G 
“k  k  "k  e 


(2.13) 


where  F.  and  G.  ( i =  1,2)  denote  the  equilibrium  solutions.  Substituting 
e  e 

(2.9)  —  (2. 12)  into  (2.13)  and  denoting  for  ease  of  notation 
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ai  -  b;qibi  +  ri 


a2  “  B2^2B2  +  R2  * 


we  obtain  the  equilibrium  solutions: 


Gj  =  (I  -  a~  BjQjB  )"  a"  bJQj^-Bjo"  B^C^ 

e 

G2  =  (I-a“1B2Q2B1a"1B;Q1B2)‘1a21B2Q2(C2-B1a^1B;Q1C1) 


(2.14] 


Fj  =  (I-a~1BjQ1B2o(21B2Q2B1)_1a[1B|qi(B2a21B2Q2-I)A  (2.15) 

e 

F2  =  (I-“2lB2Q2BlctllBlQlB2)"lct2lB2Q2(Bl0tllBlQl'I)A  (2.16) 

e 

-1_*~  ~  -1_'^  „  s -1  -1„ /„  „  -1„» 


(2.17) 

(2.18) 


As  stated  above,  and  a2*  always  exist.  From  the  matrix  inversion  lemma 


[11],  it  is  seen  that  the  existence  of 

-1„ < „  „  _1_  *  _  „  s -1 


(I-OjBjQjBja"  B’Q^)' 


(2.19) 


implies  the  existence  of 

(I-a“1B2Q2B1a^1B[Q1B2)_1.  (2.20 

Hence,  the  existence  of  the  equilibrium  defined  by  (2.13)  hinges  on  the 
existence  of  (2.19).  Note  that  the  equilibrium  cannot  be  calculated  by 
either  DM  a  priori  since  it  involves  the  other  DM's  cost  parameters.  If  the 
inverse  (2.19)  exists,  we  define  the  equilibrium  plant  as  the  system  (2.1) 
with  the  equilibrium  inputs  applied. 

The  equilibrium  plant  is 


\+1  ■  !I  +  tl(F,2-I)+U2(51-I)}Mk  +  Ul[C1-E2C2]+u2[C2-c1C1] 


(2.21 


where 


_  -1  -1„.„ 
'i =  Vi  °i  Vi 


B.ciT'b.'O, 

i  i  1  i 


Y1  *  1  -  °T1b1^1B2°21b2Q2B1 


I-“22bM'M1b;<!1B2- 


The  steady  state  of  this  equilibrium  plant  is  denoted  X 

given  by 

Xss  =  Cl-d  +  u1(C2-I)  +112(5^1)  }A]'‘1[u1(C1-?2C2)+u2(C2-51C1)] 


The  steady  state  exists  if  the  eigenvalues  of 


{i  +  UjUj-D  +u2(51-D  }a 


are  within  the  unit  circle. 

For  the  scalar  system 

*k+l  ‘  a*k  +  blu.  +  b2u2,  +  ”k 
k  k 

and  cost  function 


q.hj-c.)2  +  r.u2. 


we  obtain  from  (2. 1 5) — ( 2 . 18) 

Ei  '  -aqibiri/i 
e 

f-,  =  ~aq7b^r9/i 


S1  =  _qlbl(q2b2(c2“Cl)_Clr2)/A 
e  J' 


g2  =  _<l2b2<'qlbl(crC2)“C2rl)/A 

e 


2  2 

where  A =  r^q2b2 +  r2q^b^ + r^r2-  Since  A  is  never  zero,  the  equilibrium 
always  exists.  The  equilibrium  plant  is 

Vl  *  (arlr2xk  +  <!lblclr2  +  'I2b2c2rl)/A- 

The  equilibrium  plant  is  stable  if  jar^^/A]  <  !•  Equivalently,  it  is  stable 


rl  r2 


(2.29 


If  the  equilibrium  scalar  plant  is  stable,  the  steady  state  is 


q  bjr  c  +q  A  c 


rl°2^2  +  r2°l"l  +  rlr2-arlr2 


The  equilibrium  steady  state  controls,  u^  »  f ^  xss  +  ? 

e 


blql  (b2q2^Cl_C2^  +r2ci^1-a)) 


blqir2  +  b2q2rl  +  (1~a>rir2 


b2q2(blql(c2_CP  +r1c2(1-a) 


'ss  b1q1r2 + b2q2r1 + (l-a)r1r2 


Although  the  existence  of  the  equilibrium  plant  is  guaranteed  for 
the  scalar  case,  the  stability  of  the  equilibrium  system  is  not  known 
a  priori  to  either  DM  unless  !a|  <  1.  If  i a (  <  1  then  (2.29)  is  trivially 
satisfied.  We  note  that  for  r.  or  r„  sufficiently  small,  or  q  or  q_ 
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sufficiently  large,  any  initially  unstable  plant,  that  is,  |a|  >  1,  will  have 
a  stable  equilibrium  system.  A  problem  posed  by  this  fact  is  how  the  DM's 
would  realize  the  equilibrium  does  not  yield  a  stable  equilibrium  system 
and  how  the  DM's  should  readjust  their  r's  and  q’s  in  order  to  create  a 
stable  equilibrium. 


3.  THE  NASH  EQUILIBRIUM  SOLUTION 


In  this  section  it  is  shown  that  if  the  proposed  equilibrium  in 
(2. 15)-(2. 18)  exists,  then  the  equilibrium  is  a  Nash  equilibrium. 


The  input  strategy  U*  *»  (U*,U*, . . .  ,U*}  is  defined  to  be  a  Nash 

1  z  m 


equilibrium  solution  if,  for  each  M,  where  M  is  the  set  of  decision 


makers,  J  (U*)  <  J  (Um*,U  )  where  U®*={U*  U* . U*  .  ,U*  . , . . .  ,U*} .  In 

m  m  m  l  z  m— I  mri  m 


our  case  with  M=2,  U*={U*,U*}  is  a  Nash  equilibrium  if 


JjOJf.Uj)  <  J^Uj.Uj) 


and 


J2(uj,u*)  <  j2(uJ,u2) 


(3.1) 

(3.2) 


for  any  and  U2-  We  prove  that  the  equilibrium  given  by  (2. 15)— (2. 18) 
is  Nash  by  verifying  (3.1)  and  (3.2). 


Suppose  that  and  U2  is  a  Nash  equilibrium.  Then  we  must  have 
e  e 


J1(U1  ,U2  ^  -  J1(U1  +51’U2  } 
e  e  e  e 


(3.3) 


and 


J2(Ul  ,U2  )  <  J  (UL  ,U2  +62) 
e  e  e  e 


(3.4) 


for  any  arbitrary  m^-dimensional  and  ra2-dimensional  vectors  6^  and  62> 


respectively.  Let 


U  =  U.  +  5 .  =  F  X,  +  G.  +  <5 . . 
i  l  i  i  k  i  l 


(3.5) 


e  e  e 

Substituting  (3.5)  and  (2. 15)— (2. 18)  into  (2.2)  we  obtain 


J.(U.,tL)  =  K.  +  c'(b!q,B.  +  R.)d.. 

l  1  2  i  i  iNi  i  i  i 


(3.6) 


See  the  Appendix  for  the  expressions  for  K . .  Then 


Ji(Ul  ,U2  )  =  Ji(Ui’U2) 
e  e 


=  K. 


W° 


J1(Ui,u2  )  =J1(U1,U2)|  -  K1  +  6j(B^Q1B1+R1)51 

e  1 62=0  '  a  ' 


J2(ul  ,u2)  = 


V° 


K2  +  62(B2Q2B2+R2)62< 


Since  B^C^B^  +  R^  is  positive  definite,  the  inequalities  (3.3)  and  (3.4) 


are  met. 


The  equilibrium  in  (2. 15)— (2. 18)  is  now  known  to  be  a  Nash 
equilibrium.  It  follows  that  the  equilibrium  is  unique  provided  the 
inverses  in  (2. 1 5) — (2 . 18)  exist.  In  solving  for  the  equilibrium  we  assumed 
the  inverses  exist.  If  they  do  not  exist,  there  are  infinite  solutions  for 
(2.13).  Since  we  desire  an  algorithm  to  converge  to  the  Nash  solution,  we 
can  only  consider  systems  which  yield  a  unique  equilibrium.  As  in  the 
scalar  case,  we  can  pose  the  problem  of  how  the  DM's  should  readjust  their 
cost  parameters  to  force  a  unique  and  stable  equilibrium. 

We  have  shown  that  if  a  solution  exists  for  (2.13),  then  the 
solution  is  a  unique  Nash  equilibrium.  The  solution  to  the  finite 
horizon,  N-stage,  linear-quadratic  Nash  game  when  the  plant  and  cost 
functions  are  known  is  given  in  [10].  For  the  case  N=  1  and  C^  =  0,  we 
obtain  a  unique  Nash  equilibrium  given  by 


F  =  -R^BlQ.A^A, 
i  x  11 

e 


(3.7) 


if  the  inverse  of  A  =  I  +  B^R^B|Q^  +  B0R7*B2Q?  exists.  We  have  shown  that 


the  defined  equilibrium  also  exists  given  the  existence  of  a  specific 
matrix.  We  have  attempted  to  verify  algebraically  that  (3.7)  is  equivalent 
to  (2.15)  and  (2.16).  Various  matrix  identities  were  tried  and  the  symbolic 


processor  REDUCE  [12]  was  used.  Even  for  the  second-order  system  we  were 
not  able  to  prove  equality  for  the  general  case.  However,  when  numerical 
examples  were  examined,  (2.15)  and  (2.16)  yield  the  same  results  as  (3.7). 

It  is  believed  that  with  clever  manipulation  of  (2.15)  and  (2.16)  we 
could  obtain  (3.7). 

At  this  point  we  justify  estimating  the  other  DM’s  F  and  G  rather 

than  his  Q,  R,  and  C.  First,  we  estimate  fewer  parameters.  In  estimating 

F  and  G,  DMi  estimates  m^*(n+l)  parameters;  in  estimating  Q,  R,  and  C,  DMi 
2  2 

estimates  n  +m^  +  m^  parameters.  Second,  in  the  expression  for  the  equi¬ 
librium  feedback  U  =F  X.  +G  ,  F  and  G  are  unique.  The  corresponding 

6  6  iC  6  6  6 

3-tuple  (Q,R,C)  which  generates  the  equilibrium  feedback  is,  in  general,  not 


4.  CONVERGENCE  OF  ALGORITHMS 


We  have  seen  that  the  equilibrium  (2.13)  leads  to  a  unique  Nash 
equilibrium  solution.  In  this  section  we  examine  the  possibility  of  an 
algorithm  converging  to  a  value  different  than  the  desired  equilibrium. 

It  will  be  shown  that  it  is  not  possible  for  the  DM's  to  follow  an  equilibrium 
trajectory  while  incorrectly  estimating  each  other's  control  input.  This 
point  is  important  for  algorithms  which  update  the  estimates  F  and  G  based 
on  the  error  in  the  estimate  of  the  next  state. 


4.1.  The  Error  Equations 


Consider  the  scalar  case  of  (2.1) 

Xk+1  *  axk +  Vi+Vv 

k  k 


In  this  development  we  assume  the  noise  to  be  zero,  because  the  probability 
of  the  DM's  estimating  the  next  state  correctly  when  the  system  is  driven 
by  Gaussian  noise  is  zero. 

DM1  sees  (4.1)  as 


=  ax.  +  b.u,  +  b0u- 

‘k+i  k  1  ‘k  2  \ 


and  DM2  sees  the  system  as 


:2.  ,  ’  SXk  +  Vl.  +b2U2  ■ 

k+1  k  k 


At  stage  k  the  DM's  apply  their  inputs  and  estimate  the  next  state  x^+^ . 
At  stage  k+1,  the  DM's  are  given  the  state  x,  .  Each  DM  then  formulates 


his  error  which  is  the  difference  between  the  state 


*k+l 


and  the  estimate 


x. 

i. 


k+1 


e 


e 


k+1 


k+1 


Xk+1 ~ *1 
1  :k+l 

\+l  "  *2  • 

*  1  Zk+1 


(4.4) 

(4.5) 


Substituting  (4.1) -(4. 3)  into  (4.4)  and  (4.5)  we  have 


e  =  b  (u  -u  ) 

k+1  zk  k 

(4.6) 

e  =  b  (u  -u  ). 

k+1  \  xk 

(4.7) 

It  is  clear  that,  by  definition,  if  the  DM's  play  the  equilibrium  inputs 

(2. 15)— (2. 18) ,  the  errors  (4.6)  and  (4.7)  are  zero.  We  intend  to  show  that 

a  necessary  and  sufficient  condition  for  the  errors  in  the  estimates  to  be 

zero,  and  remain  zero,  is  the  inputs  to  (4.1)  are  the  equilibrium  inputs. 

We  consider  update  laws  of  the  form  f.  =f.  +  $(e.  )  and  g  =  g  +9(e.  ) 

Xk+1  *k  \  *k+l  xk  xk 

where  and  9  are  functions  such  that  $(0)  =  9(0)  =0.  This  proposition  is 

proved  in  the  following  two  sections. 


4.2.  The  Moving  State  Case 

Clearly,  if  one  of  the  errors  is  not  zero,  then  the  corresponding 
DM  has  incorrectly  estimated  the  other’s  input.  Suppose  the  errors  are 


both  zero  and  b,  and  b„  are  not  zero.  We  have 
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i  =  u  -u  =  f2  xk  +  g2  "  (f2  \  +  g2  )  = 
k+1  k  ^k  ^k  K  k  k  K  k 


=  0 


(4.8) 


S2.  ,  ■  U1  '“I  ’  £l*k  +  V<V'‘  +  V  "  0 

k+1  k  k  k  k  k  k 

which  implies 


(4.9) 


V* +  S  ■  Vk  +  \ 
V +  \  ■  +  V 


(4.10) 

(4.11) 


From  (4.10)  and  (4.J.1),  if  **  f  ( i  =*  1  or  2)  then  g^  =  g^,  and  if  ®i  ~  ^i 
then  f  ,  provided  x^^O.  If  this  is  true  for  both  DM's  then,  by 

definition,  the  DM's  are  playing  the  equilibrium  inputs.  Suppose  this  is 
not  true  for  i=2.  (A  similar  argument  holds  for  i*l.)  Since  the  estima¬ 
tion  errors  (4.6)  and  (4.7)  are  both  zero,  the  DM's  do  not  update  their 
estimates  of  f  and  g  for  the  next  input.  This  implies  the  inputs  given  by 
(2. 9) -(2. 12)  remain  the  same.  The  errors  of  stage  k+2  are 


lk+2  *  b2(f2  Xk+l+82,  ,  “(f2  Xk+l+g2  ’  b2(f2  Xk+1+S2 

k+1  k+1  k+1  k+1  k  k 

“  (^2  Xk+l+82  ^ 
k  1 

w '  bi<fk+A+i+8‘k+r<{k+A+i+iil.+1)>'i‘<Ei>i+8‘1c 

-(£LXk+l‘®lu>)' 

k  k 


(4.12) 


(4.13) 


We  now  investigate  whether  the  errors  (4.12)  and  (4.13)  can  again 
be  both  zero  if  DM1  is  not  estimating  DM2 ' s  control  correctly.  Let 


1. 


f  2,  Xt  +  8 2. 

k  k 


f 2,  Xt  +  8 2.  ' 
k  k 


(4.14) 


(4.15) 
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t£{k,k+l),  and  assume  f  ^f  and  g»  f  g~  .  A  plot  of  the  linear  equations 

k  2k  k  k 

(4.14)  and  (4.15)  given  in  Figure  1  illustrates  the  results.  Clearly,  if 

x,  .  ^  x,  then  y  ty  and  an  error  is  generated.  From  the  above  we 
k+1  k+1 

know  that  if  there  is  an  error  in  estimating  the  next  state  then  the  equi¬ 
librium  inputs  are  not  being  applied.  Although  DM1  was  able  to  estimate 

the  state  at  k+1  correctly  while  not  using  the  cOi.rect  f „  and  g  ,  this 

x  2k 

fault  is  revealed  at  stage  k+2  if  ^  x^.  We  see  it  is  possible  for 

both  players  to  estimate  the  next  state  correctly  even  though  one  DM  may  not 
be  playing  the  equilibrium  input.  However,  if  the  state  changes  at  the 
next  stage, the  error  surfaces. 


4.3.  The  Constant  State  Case 

We  now  consider  the  case  when  the  state  remains  constant,  x,  ,  = x,  . 

k+1  k 

The  errors  (4.12)  and  (4.13)  reduce  to  (4.8)  and  (4.9)  which  are  zero. 

By  simple  mathematical  induction  we  see  that  the  state  will  remain  constant 

for  all  future  stages,  the  errors  will  remain  zero,  and  therefore,  there 

will  be  no  updating  of  the  controls.  DM1  will  continue  to  apply  the  wrong 

f?  and  g9  ,  but  he  will  have  no  error  in  his  prediction  of  the  next  state. 

“k  “k 

Let  us  reconcile  this  difficulty  by  examining  the  properties  of  the  constant, 


or  steady  state  x^.  To  be  more  general,  let  us  not  require  DM2  to  estimate 


f  and  g  correctly  either, 
k  lk 


We  have 


f2„*k  +  s2  -  (f2.’Ik  +  s2.)  '  0 
k  k  k  k 

fL  \  +  81  -(iLXk  +  V >  "  ° 
k  k  k  k 

Vl  ‘  “k  +  VVV  V  +  b2(f2  xk  +  82.)  '  xk 

k  k  k  k 

with  f  ,  g  ,  f.  ,  g7  given  by  (2.9)-(2A2) ,  respectively,  yielding 
Xk  lk  K  k 


/-qjbjU+bjf^) 

v  bNi+ri 


b2(a+b  f  ) 
k 


2\  ,  2 


b2q2  +  r2 


-  1/x,  +  b 


*k  +  bl' 


"qlbl<b282 


blql  +  rl 


7~) 


/-q  b  <b  8  -b  A 

+  bf  -0. 

\  b2q2  +  r2  / 


(4.16) 


'q2b2  <'a+bl f  I,  * 


)  -q2b2(bl«l  -°2) 

—  \  *  I- - (f2,.Xk+82,>  '  ° 


b^q^  +  r^  b2q2"hr2  ^  ^ 


(4.17) 


-qlb1(a+b2£  2>  -qlbl(b282 ."=1) 

- - -  X  +  - - - (f  Xv+S1  )  =0* 

biqi+ri  biq’+ri  k  k 


(ri  i 


(4.18) 


We  have  three  equations  in  five  variables:  x  ,  f .  ,  f „  ,  g  ,  g„  .  It 

k  k  k  xk  k 

appears  we  have  two  degrees  of  freedom  and  so  let  us  fix  x,  and  f  to  any 

k 

arbitrary  values  and  solve  the  remaining  system.  (Similar  results  hold  if 


we  fix  f2  ,  ,  or  g^  .) 

K  k  k 


Solving  (4.18)  for  g  we  obtain 

Xk 

=  ~~2~‘ -  [(q1b1(a+b2f2  )  +  (bjqj+r^^  +  9^  (b2£2  -Cj)  ]  .  (4.19) 


,[k  b^q1  +  r1  1  1  *  it  1  1  A  lk  ~  *  ‘  "  ^k 


Substituting  (4.19)  into  (4.17)  and  solving  for  g_  gives 
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’2k  r^2  +  r1b2q2  +  r2b2q1 


—  [Vk  +  V 


(4.20) 


where 

W1  =  ab2q2(blql+rl)"qlq2blb2(a+b2i21,)  +  (bfql+rl)(b2q2+r2)!21 

k  k 

W2  =  qlq2blb2Cl  '  q2b2C2(blql  +  rl) * 

Note  that  in  (4.20)  g„  has  no  dependence  on  f  .  Finally,  substituting 

\  Lk 

(4.19)  and  (4.20)  into  (4.16)  we  obtain 


qlblr2cl'|-<12b2Ilc2 
k  riq2b2  +  r2<]l,>l  +  rlr2-arir2 


(4.21) 


Note  that  the  variable  we  were  solving  for,  f  ,  drops  out  and  does  not 

2k 

appear  in  (4.21).  In  its  place  we  have  a  requirement  for  the  arbitrarily 
chosen  x^.  If  the  state  is  not  at  (4.21),  then  the  system  of  equations 
(4 . 16) - (4 . 18)  is  inconsistent.  For  a  solution,  or  a  set  of  solutions  to 
exist,  we  must  have  the  constant  state  at  (4.21).  We  recognize  this  require¬ 
ment  as  the  steady  state  (2.30)  of  the  equilibrium  scalar  plant.  The  DM's 
may  estimate  each  other's  control  incorrectly  with  no  error,  and  continue 
to  have  no  error  in  estimating  the  next  state  only  if  the  next  state  remains 
at  the  equilibrium  steady  state. 

Substituting  the  steady  state  into  the  equations  (4 . 16) -(4 . 18)  and 
denoting  the  results  as  the  steady  state  controls,  we  obtain 


-  a  ?  -  2 

■(qlr^Cl(fl  blcl  +  1-abj)  +  b1b2c51q2(crc2)  "  fi 
k _ _  _ k 

a  2 

blqlC2  +  b2q2rl  +  rlr2-arlr2 


(4.22) 


“ (q2r  1C2 (f 2  b2C2  +  1_ab2)  +blb2qlq2(c2~Cl)  ~  f2  blqlr2CP 
- _ _k _ _k _ 

2SS  b^1r2+b2Vl+rlr2-atlr2 


(4.23) 


Note  that  is  a  function  of  but  not  of  f^  ,  and  is  a  function 

.  ss  «  ik 

of  f„  but  not  of  f 


ss 


V 


For  the  original  system  of  equations  to  be  consistent 


this  separation  of  estimation  parameters  must  occur.  We  solve  for  each  DM's 

/\  A  A 

estimate  of  the  other's  control,  u,  =f.x  +g.  ,  from  (4.21)-(4.23) : 

i  1  ss  1 
ss  ss 


blql^b2q2^Cl”C2^  +r2ci^1-a)^ 
Lss  bjq^  +  b^  +  r^-ar^ 


(4.24) 


b2q2(blql^C2~CP  +r1c2^1~a)) 

2ss  bjq^j  +  b^r^r^-ar^ 


(4.25) 


We  observe  that  even  though  f  and  f.  can  be  chosen  arbitrarily,  the 

lk  2k 

resulting  estimates  of  the  steady  state  inputs  are  constants  and  equal  to 
the  equilibrium  inputs  (2.31)  and  (2.32).  For  the  state  to  remain  at  the 
true  equilibrium,  the  DM's  must  estimate  that  the  other  is  playing  the 
equilibrium  input.  This  in  turn  causes  the  DM  to  play  his  own  true 
equilibrium.  Since  the  DM's  are  using  the  equilibrium  inputs,  they  are  not 
penalized  for  incorrectly  estimating  f  and  g. 

In  summary,  we  see  that  if  a  DM  estimates  the  next  state  incorrectly 
then  an  error  has  been  made  in  estimating  the  input  parameters.  If  the  DM's 
estimate  the  next  state  correctly  and  continue  to  estimate  the  next  state 
correctly,  then  the  inputs  u  have  been  estimated  correctly.  If  the  state  is 
changing  and  the  DM's  estimate  the  inputs  correctly,  then  they  have  also 
estimated  the  parameters  f  and  g  correctly.  If  the  state  is  not  changing 
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and  the  DM’s  estimate  the  inputs  u  correctly,  then  the  state  is  at  the 
steady  state  (2.30).  At  the  steady  state  the  dynamics  of  the  system  are 
lost.  In  this  case, the  DM's  do  not  have  to  estimate  the  input  parameters 
correctly  when  minimizing  their  costs. 

The  simple  example  given  below  demonstrates  the  steady  state 
situation.  Consider  the  system 


■  +  Vl,  +  b2UZ 

k  k 

2  2 

■  V2  +  Vl 

k 

2  2 

'  Vk  +  r2"2. 


with  the  control  laws 


“I,  '  fi.  *k 
k  k 


X  ■  f2,V 

k  k 


An  obvious  solution  for  minimizing  the  cost  functions  is  x^*0  and  u^^O. 

At  the  steady  state,  x  =0,  each  DM  estimates  the  other's  input  will  be 

s  s 

zero  and  applies  his  input  u^  =  0.  The  DM's  do  not  have  to  estimate  the 
equilibrium  f^'s;  for  any  f^  their  corresponding  input  will  be  the  correct 
value,  u  =  0. 


4.4.  The  Vector  Case 

We  now  briefly  examine  the  convergence  of  algorithms  for  the  general 
vector  problem.  Following  the  development  of  Sections  4.1  and  4.2,  we 


obtain 
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-X, 


k+1 


B2(U2 '  “V 

k  k 


(4.26) 


k+1 


WV,  ■  V“i  -ui  >• 

k+1  k  k 


(4.27) 


The  errors  can  be  zero  if  the  difference  U.  -U.  is  zero  or  if  the  difference 

\  Sc 

is  in  the  nullspace  of  B^.  To  eliminate  the  latter  possibility, we  now 
require  and  B^  to  have  full  column  rank.  This  requirement  is  not  too 
restrictive;  it  is  equivalent  to  having  no  redundant  control.  With  this 
requirement  the  errors  are  zero  if 


i 

and 


I 


(F-  -F  )X.  +  (G  -G  )  =  0 
k  k  k  k 


(F.  -F  )2L  +  (G  -G  )  =  0. 
Lk  Lk  ^  k  Lk 


We  conjecture  results  for  convergence,  similar  to  those  obtained  for  the 
scalar,  case  could  be  obtained.  The  matrix  algebra  required  to  justify  our 
!  conjecture  may  be  formidable.  We  realize  there  may  be  a  possibility  of 

converging  to  incorrect  values  if  G.  -G.  lands  in  the  range  space  of 
F.  -F  ,  but  we  believe  this  is  unlikely. 

Xk  V 

I 


I 
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5.  A  PROPOSED  ALGORITHM 


In  this  section  an  algorithm  is  proposed  for  determining  the  Nash 
equilibrium  from  the  reaction  relations  of  the  decision  makers.  Several 
gradient-type  algorithms  are  available  for  estimating  unknown  parameters 
[13], [14].  Convergence  of  these  schemes  has  been  shown  for  the  single  input 
case.  We  maintain  the  spirit  of  these  algorithms  and  extend  them  to  systems 
with  multiple  users. 

We  assume  a  decision  maker  can  remember  his  previous  L  inputs,  his 

previous  L  estimates  of  the  other's  inputs,  and  the  previous  L  states.  The 

number  L  is  a  finite  memory  buffer  size  whose  value  depends  on  the  order  of 

the  system.  For  the  noiseless  case,  DM1  can  solve  (4.26)  for  DM2's  previous 

input  U  .  Knowing  1C  and  U  is  not  enough  to  determine  F  and  G  . 

\  K  2k  K  K 

However,  if  F^  and  G^  are  not  changing  rapidly,  DM1  can  consider  them  to 

be  constant  over  the  last  L  stages.  DM1  can  then  use  a  least-squares  scheme 

for  determinng  the  best  estimate  of  DM1 's  input  parameters  F„  and  G„  .  We 

k  k 

denote  these  estimates  as  F  and  G»  .  Now  DM1  can  use  the  following  updating 

\  2k 

scheme: 

F_  =  F  +  \  (F  -F  )  (5.1) 

k+i  \  ^k 


G  =  G  +  \  (G  -Q  )  .  (5.2) 

k+1  k  "  k  k 

If  F  = F„  then  a  simple  interpretation  of  the  updating  scheme  is:  DMl's 
2k  2k 

next  choice  of  is  the  average  of  his  last  estimate  and  DM2's  actual  last 
input.  We  note  the  scheme  in  (5.1)  and  (5.2)  has  the  desirable  property  of 
not  updating  when  there  is  no  error  in  estimating  the  previous  input. 


The  algorithm  is  initialized  with  the  assumption  that  the  decision 

makers  have  similar  costs  and  objectives.  DMl's  first  estimates  and 

o  o 

are  found  by  replacing  DM2's  unknown  cost  parameters  C^,  »  and  with 

DMl's  parameters  Q^,  R^,  and  C^.  DM2  does  likewise. 

As  noted  above,  convergence  for  the  stochastic  gradient  algorithms 
has  been  proven  for  the  single  user  case.  We  have  not  shown  they  converge 
for  the  multiple  user  case.  However,  from  our  results  of  Section  4,  if  the 
proposed  algorithm  converges,  it  must  converge  to  the  correct  values. 
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6.  A  WORKING  EXAMPLE 


Consider  the  system 


5  9 


2  6' 


U  +(  )U  +  oW  ,  X  -I 

1  3  /  2k  \  9  8  /  2k  k  \  0 


with  cost  functions 


Ji  *  M 


.o  jmdwc  y 


i. 


is  a  diagonal  matrix  whose  entries  are  from  a  zero  mean,  unit  variance, 

Gaussian  distribution.  The  noise  level  is  scaled  with  the  factor  a. 

( 2  5\ 

The  matrix  I  I  is  not  stable  because  it  has  an  eigenvalue  in 
the  left  half  plane,  and  so  control  is  required  to  stabilize  the  plant. 

In  this  example  the  DM's  know  the  other's  0  and  R  but  do  not  know 
the  other's  target  vector  C.  The  DM's  can  calculate  a  priori  the  equilibrium 
F's,  but  must  estimate  each  other's  G  vector. 

Solving  (2. 15)— (2. 18)  and  (2.21)  for  the  example  above,  we  obtain 


ss 


0.055  -0.153 

0.061  -0.274 

0.249 
3.543 

9.711 
-2.880 


-0.518  0.019 

-0.308  -0.278 

2.384 
-4.545 


The  following  figures  show  results  of  simulations  with  systems 
having  no  noise,  a  =  0,  and  systems  having  noise,  a  =  0.1.  The  figures 
display  the  actual  state  and  its  estimates  and  X2>  and  the  controls 
and  G_  and  their  estimates  G,  and  G-.  Since  the  state  and  the  control  G. 

2  12  i 

are  two-dimensional  vectors,  we  display  the  components  of  these  vectors  one 
at  a  time.  We  use  the  following  notation  to  indicate  the  components: 


The  results  of  simulations  with  o  =  0  are  given  in  Figures  2-7. 

In  Figures  2  and  3  we  see  how  the  estimates  X.  converge  to  the  actual 

k 

state  X^.  We  also  note  that  the  state  converges  to  the  true  steady  state 

value.  Figures  4-7  show  the  estimates  G.  converging  to  the  inputs  G.  . 

*k  1k 

We  also  see  that  the  inputs  G,  converge  to  the  equilibrium  values  G^  . 

\  e 

The  results  of  simulations  with  a =0.1  are  given  in  Figures  8-15. 

Figures  8-11  show  the  state  and  its  estimates  and  Figures  12-15  show  the 

control  and  its  estimates.  The  effect  of  the  noise  is  evident  in  the 

plots  of  the  actual  state.  The  estimates  of  the  state  do  not  fluctuate  as 

much  as  the  actual  state,  because  the  actual  state  is  driven  by  the  noise. 

We  again  note  that  the  estimates  go  to  their  equilibrium  values.  We 


Figure  13.  The  control  g,_  and  its  estimate  g,«  for 


Figure  14.  The  control  g  and  its  estimate  g„  for 


observe  from  Figures  12-15  that  the  noise  affects  the  estimates  G.  more 

\ 

severely  than  the  actual  controls  G.  .  The  controls  tend  to  their  equili- 
brium  values,  but  the  estimates  vary  about  the  true  control. 

The  estimates  are  more  sensitive  to  noise  than  the  actual  inputs 
because  the  estimates  are  driven  directly  by  the  noise.  A  DM  bases  his 
estimates  on  his  calculation  of  the  other's  previous  input.  In  the  noise¬ 
less  case,  he  can  determine  the  other  DM's  previous  input  exactly;  however 
when  there  is  noise  the  measurement  is  corrupted,  and  he  can  only  estimate 
the  previous  control.  The  functions  (2.9)— (2. 12)  which  determine  the 
actual  control  act  as  a  filter  for  the  noisy  estimates. 


7 .  CONCLUSION 


A  simple  method  for  determining  the  Nash  equilibrium  from  the 
reaction  relations  of  the  decision  makers  has  been  presented. 

An  equilibrium  was  proposed  and  shown  to  be  equivalent  to  the 
Nash  equilibrium.  The  class  of  algorithms,  which  update  based  upon  the 
error  in  the  estimated  state,  was  considered  and  it  was  proven  that  these 
algorithms  could  not  converge  to  an  incorrect  value.  A  sample  from  this 
class  was  described  and  an  example  worked. 

This  work  leaves  open  many  areas  for  future  work  in  the  study  of 
Nash  games.  For  example,  algorithms  which  have  better  convergence  properties 
can  be  studied.  Also,  problems  with  different  information  structures 


should  be  investigated. 
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APPENDIX 


PROOF  OF  NASH  EQUILIBRIUM  -  THE  CONSTANT  K± 


The  constant  is  given  by  the  following  expression: 


K.  = 


VqiAV  2Wi  +  cIVi +  2x£AVi(Fi  Xk+Gi  > 

e  e 

+  2K.A'W?2  V°2  >  +  (F1  \  +  Gl  >’BiWFl  \  +  Gl  > 

e  e  e  e  e  e 

+  <F2  V°2  >'W2(F2  Xk  +  G2  > 

e  e  e  e 

+  2(F  +  )’B;Q1B2(F2Xk+G2  ) 

e  e  e  e 

+  (fi  \+Gi  >’VFt  \tci>-2Wi(Fi  vgi  > 

e  e  e  e  e  e 

-2C'QiB2(F2  Xk  +  G2  ). 


1 

1 
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